dSCAM: Finding Document Copies Across Multiple Databases

نویسندگان

  • Hector Garcia-Molina
  • Luis Gravano
  • Narayanan Shivakumar
چکیده

The advent of the Internet has made the illegal dissemination of copyrighted material easy. An important problem is how to automatically detect when a \new" digital document is \suspiciously close" to existing ones. The SCAM project at Stanford University has addressed this problem when there is a single registered-document database. However, in practice, text documents may appear in many autonomous databases, and one would like to discover copies without having to exhaustively search in all databases. Our approach, dSCAM, is a distributed version of SCAM that keeps succinct metainformation about the contents of the available document databases. Given a suspicious document S, dSCAM uses its information to prune all databases that cannot contain any document that is close enough to S, and hence the search can focus on the remaining sites. We also study how to query the remaining databases so as to minimize di erent querying costs. We empirically study the pruning and searching schemes, using a collection of 50 databases and two sets of test documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dscam and DSCAM: complex genes in simple animals, complex animals yet simple genes.

Cadherins and the immunoglobulin (Ig) proteins give rise to a multitude of surface receptors, which function as diverse cell adhesion molecules (CAMs) or signal-transducing receptors. These functions are often interdependent, and rely on a high degree of specificity in homophilic binding as well as heterophilic interactions. The Drosophila receptor Dscam is an exceptional example of homophilic ...

متن کامل

DSCAM Is a Netrin Receptor that Collaborates with DCC in Mediating Turning Responses to Netrin-1

During nervous system development, spinal commissural axons project toward and across the ventral midline. They are guided in part by netrin-1, made by midline cells, which attracts the axons by activating the netrin receptor DCC. However, previous studies suggest that additional receptor components are required. Here, we report that the Down's syndrome Cell Adhesion Molecule (DSCAM), a candida...

متن کامل

Transmembrane/Juxtamembrane Domain-Dependent Dscam Distribution and Function during Mushroom Body Neuronal Morphogenesis

Besides 19,008 possible ectodomains, Drosophila Dscam contains two alternative transmembrane/juxtamembrane segments, respectively, derived from exon 17.1 and exon 17.2. We wondered whether specific Dscam isoforms mediate formation and segregation of axonal branches in the Drosophila mushroom bodies (MBs). Removal of various subsets of the 12 exon 4s does not affect MB neuronal morphogenesis, wh...

متن کامل

The lncRNA landscape of breast cancer reveals a role for DSCAM-AS1 in breast cancer progression

Molecular classification of cancers into subtypes has resulted in an advance in our understanding of tumour biology and treatment response across multiple tumour types. However, to date, cancer profiling has largely focused on protein-coding genes, which comprise <1% of the genome. Here we leverage a compendium of 58,648 long noncoding RNAs (lncRNAs) to subtype 947 breast cancer samples. We sho...

متن کامل

Drosophila Dscam Is Required for Divergent Segregation of Sister Branches and Suppresses Ectopic Bifurcation of Axons

Axon bifurcation results in the formation of sister branches, and divergent segregation of the sister branches is essential for efficient innervation of multiple targets. From a genetic mosaic screen, we find that a lethal mutation in the Drosophila Down syndrome cell adhesion molecule (Dscam) specifically perturbs segregation of axonal branches in the mushroom bodies. Single axon analysis furt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996